DataScience Blog

Outlier Analysis - R

January 20, 2021

Please find the Outlier Analysis in R,

Convert Strings into factor numeric

for(i in 1:ncol(T1))
{
if(class(T1[,i]) ==‘factor’)
{
T1[,i] = factor(T1[,i], labels=(1:length(levels(factor(T1[,i])))))
}
}
p1 The original csv file and the file after changing strings to factor numeric is given below to understand the difference better p2 p3

Outlier Analysis

Boxplots - Distribution and Outlier Check

Extracting only numeric variables

numericindex = sapply(T1, is.numeric)
numeric
data = T1[,numeric_index]
view(numericindex)
view(numeric
data)
cnames = colnames(numericdata)
p4 p5 p6
for (i in 1:length(cnames))
{
assign(paste0(“gn”,i), ggplot(aes
string(y = (cnames[i]), x = “responded”), data = subset(T1))+ statboxplot(geom = “errorbar”, width = 0.5)+geomboxplot(outlier.colour=“red”, fill = “grey”, outlier.shape=18, outlier.size=1, notch=FALSE) + theme(legend.position=“bottom”)+labs(y=cnames[i], x=“responded”)+ggtitle(paste(“Box plot of responded for”, cnames[i])))

}
p7

** Note ** If you get an error, could not find function ggplot, please do execute below commands.
install.packages(“ggplot2”)
library(“ggplot2”)
After executing above commands, run the code.

Plotting plots together

gridExtra::grid.arrange(gn1,ncol=1)
Boxplot will look like below
p8

With this we will end up this. The Detailed outlier analysis will be updated later.

Stay Tuned for

  • Feature Selection

  • Feature Scaling

  • Sampling Techniques

And we still have python implementation of all these starts from Missing Value Analysis to Sampling Techniques!!.

sssss!!!! Wait!.. And the real Roller coaster ride starts hereafter. Yes.. Machine Learning!!! Much awaited portion!!!.. stay tuned to learn :)